Model-based Sparse Component Analysis for Multiparty Distant Speech Recognition

نویسنده

Afsaneh Asaei

چکیده

This thesis takes place in the context of multi-microphone distant speech recognition in multiparty meetings. It addresses the fundamental problem of overlapping speech recognition in reverberant rooms. Motivated from the excellent human hearing performance on such problem, possibly resulting of sparsity of the auditory representation, our work aims at exploiting sparse component analysis in speech recognition front-end to extract the components of the desired speaker from the competing interferences (other speakers) prior to recognition. More specifically, the speech recovery and recognition are achieved by sparse reconstruction of the (high-dimensional) spatio-spectral information embedded in the acoustic scene from (low-dimensional) compressive recordings provided by a few microphones. This approach exploits the natural parsimonious structure of the data pertained to the geometry of the problem as well as the information representation space. Our contributions are articulated around four blocks. The structured sparse spatio-temporal representation of the concurrent sources is constituted along with the characterization of the compressive acoustic measurements. A framework to simultaneously identify the location of the sources and their spectral components is derived exploiting the model-based sparse recovery approach and, finally, the acoustic multipath and sparsity models are incorporated for effective multichannel signal acquisition relying on beamforming. This work is evaluated on real data recordings. The results provide compelling evidence of the effectiveness of structured sparsity models for multi-party speech recognition. It establishes a new perspective to the analysis of multichannel recordings as compressive acquisition and recovery of the information embedded in the acoustic scene.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Sparse component analysis for speech recognition in multi-speaker environment

Sparse Component Analysis is a relatively young technique that relies upon a representation of signal occupying only a small part of a larger space. Mixtures of sparse components are disjoint in that space. As a particular application of sparsity of speech signals, we investigate the DUET blind source separation algorithm in the context of speech recognition for multiparty recordings. We show h...

متن کامل

A New IRIS Segmentation Method Based on Sparse Representation

Iris recognition is one of the most reliable methods for identification. In general, itconsists of image acquisition, iris segmentation, feature extraction and matching. Among them, iris segmentation has an important role on the performance of any iris recognition system. Eyes nonlinear movement, occlusion, and specular reflection are main challenges for any iris segmentation method. In thi...

متن کامل